<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <meta name="keywords" content="remote machines, Git, GitHub" />
    <meta name="description" content="Lecture introducing remote machines and version control using Git." />
    <title>BCB 546X -- 17 Jan 2017</title>
    <style>
      @import url(https://fonts.googleapis.com/css?family=Droid+Serif);
      @import url(https://fonts.googleapis.com/css?family=Yanone+Kaffeesatz);
      @import url(https://fonts.googleapis.com/css?family=Ubuntu+Mono:400,700,400italic);

      body {
        font-family: 'Droid Serif';
      }
      h1, h2, h3 {
        font-family: 'Yanone Kaffeesatz';
        font-weight: 400;
        margin-bottom: 0;
      }
      .remark-slide-content h1 { font-size: 3em; }
      .remark-slide-content h2 { font-size: 2em; }
      .remark-slide-content h3 { font-size: 1.6em; }
      .footnote {
        position: absolute;
        bottom: 3em;
      }
      li p { line-height: 1.25em; }
      .red { color: #fa0000; }
      .large { font-size: 2em; }
      a, a > code {
        color: rgb(249, 38, 114);
        text-decoration: none;
      }
      code {
        background: #e7e8e2;
        border-radius: 5px;
      }
      .remark-code, .remark-inline-code { font-family: 'Ubuntu Mono'; }
      .remark-code-line-highlighted     { background-color: #373832; }
      .pull-left {
        float: left;
        width: 47%;
      }
      .pull-right {
        float: right;
        width: 47%;
      }
      .pull-right ~ p {
        clear: both;
      }
      #slideshow .slide .content code {
        font-size: 0.8em;
      }
      #slideshow .slide .content pre code {
        font-size: 0.9em;
        padding: 15px;
      }
      .inverse {
        background: #272822;
        color: #777872;
        text-shadow: 0 0 20px #333;
      }
      .inverse h1, .inverse h2 {
        color: #f3f3f3;
        line-height: 0.8em;
      }
      /* Two-column layout */
      .left-column {
        color: #777;
        width: 20%;
        height: 92%;
        float: left;
      }
        .left-column h2:last-of-type, .left-column h3:last-child {
          color: #000;
        }
      .right-column {
        width: 75%;
        float: right;
        padding-top: 1em;
      }
    </style>
  </head>
  <body>
    <textarea id="source">

class: center, middle

# Remote Machines & Intro to Git

???

Notes for the _first_ slide!

---

# Agenda

## Remote machines and SSH

## Version control
* What is version control?
* Examples
* Version control systems and **Git**
* Git introduction and demo


---

# Remote Machines

### What are they?

--

* Computers that are not locally available, but accessable via log-in over some network.

--

* Remote access allows you to control the computer even if you're not directly connected to it.

--

* For analysis of biological data, these computers are often high-performance computing clusters.

---
# Remote Machines

### Why do we need remote machines?

--
```
¯\_(ツ)_/¯
```
---

# Remote Machines

### Why do we need remote machines?

* You can't run analysis software on your laptop. HPC resources are much more powerful than a single user's personal computer.

* Machines are often hosted by your institution as a shared resource, installed and maintained by experts, and backed up regularly.

* Remote servers are used to host publicly available software and data, thus enhancing reproducibility. 

---

# Accessing Remote Machines

## Secure Shell (SSH)

Provides a secure (encrypted) channel over the network for users to access remote machines. 
--

```
$ ssh phylo@speedy.ent.iastate.edu
Password:
Last login: Wed Dec 7 15:50:28 2016 from phylo-2015.eeob.iastate.edu

phylo@speedy $
```
--

[Speedy](http://www.biology-it.iastate.edu/speedy) is a 24-core remote machine managed by [Biology Information Technology](http://www.biology-it.iastate.edu/)

--

*All remote machines hosted by Iowa State University cannot be accessed outside of the ISU network unless you use the [VPN](https://www.it.iastate.edu/howtos/vpn)
---


# Accessing Remote Machines

## What can I do once I've logged on? 

```
$ ssh phylo@speedy.ent.iastate.edu
Password:
Last login: Wed Dec 7 15:50:28 2016 from phylo-2015.eeob.iastate.edu

phylo@speedy $ ¯\_(ツ)_/¯
```
--

Most remote machines run Unix-based operating systems. So you can navigate your file system, run programs, or anything you'd do on such a computer (within the limits of your permissions).

Let me show you...

---

# Accessing Remote Machines

## Streamline remote access: Storing frequent SSH hosts

In Unix systems, you have a file that allows you to create short-cuts to your remote machines: `~/.ssh/config`

This is mine:

```
Host montgomery
	Hostname 129.237.138.231
Host jasper
	Hostname 129.237.138.173
Host baboy
	Hostname bayes4.biol.berkeley.edu
Host speedy
	Hostname speedy.ent.iastate.edu
Host speedy2
	Hostname speedy2.ent.iastate.edu
Host condo
	Hostname condo.its.iastate.edu
```

Each `Host` is a shortcut to the URI of a specific remote machine.

---

# Accessing Remote Machines

## Streamline remote access: Storing frequent SSH hosts

In Unix systems, you have a file that allows you to create short-cuts to your remote machines: `~/.ssh/config`

```
Host speedy
	Hostname speedy.ent.iastate.edu
```
--

With the host names in the `.ssh/config` file, I don't have to type the whole URI to log in:

```
$ ssh phylo@speedy 
Password:
Last login: Wed Dec 7 15:50:28 2016 from phylo-2015.eeob.iastate.edu

phylo@speedy $ 
```

---

# Accessing Remote Machines

## Streamline remote access: Authentication with a SSH key

Passwords are not always secure and can be annoying to type.

--

* SSH keys are much more secure and allow you to log in without typing your password 
(or just a different, simpler passphrase).

--

* When you generate a key, you create two things: a public key and a private key.

--

* You place the public key on any server and then unlock it by connecting to it with a client that already has the private key.

--

* When the two match up, the system unlocks without the need for a password.

--

* SSH keys are also very important for using Git with remote hosts (e.g., GitHub)

---

# Accessing Remote Machines

## Streamline remote access: Authentication with a SSH key

Generate your key in your home directory: `cd ~`

--

Create the key pair:

```
$ ssh-keygen -t rsa
```
--

Once the `ssh-keygen` command has been issued, you will be asked a few questions:

--

```
Enter file in which to save the key (/home/phylo/.ssh/id_rsa):
```

You can just hit _enter_ here and it should save it to the file path given.

--

```
Enter passphrase (empty for no passphrase):
```

Here is where you decide if you want to password-protect your key. The downside, 
to having a passphrase, is then having to type it in each time you use the Key Pair. 

---

# Accessing Remote Machines

## Streamline remote access: Authentication with a SSH key

```
$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/phylo/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/phylo/.ssh/id_rsa.
Your public key has been saved in /home/phylo/.ssh/id_rsa.pub.
The key fingerprint is:
4a:dd:0a:c6:35:4e:3f:ed:27:38:8c:74:44:4d:93:67 phylo@Arrow
The key's randomart image is:
+--[ RSA 2048]----+
|          .oo.   |
|         .  o.E  |
|        + .  o   |
|     . = = .     |
|      = S = .    |
|     o + = +     |
|      . o + o .  |
|           . o   |
|                 |
+-----------------+
```

---

# Accessing Remote Machines

## Streamline remote access: Authentication with a SSH key

With the key-pair, it's now quite simple to access `speedy`.

```
$ ssh phylo@speedy 
Last login: Wed Dec 7 15:50:28 2016 from phylo-2015.eeob.iastate.edu

phylo@speedy $ 
```

---

# Tracking Your Science

## Saving revisions is very important.

## What kind of versioning do most of you use?

```
¯\_(ツ)_/¯
```
---

# Tracking Your Science

### Many biologists use the "multiple-file" system with cloud-based file sharing (e.g., Dropbox, Google Drive, Box) 

--

* Example: A grant proposal that involved 5-6 contributors:

--

```
3111123 Aug  2  2015 NSF_Project_Description_Aug2-4PM.pdf
2824531 Aug  2  2015 NSF Panama Grant Draft 2015-08-01_v2_JN.docx
2366621 Aug  2  2015 NSF EAH final revisions 2015-08-02_fin.docx
1581164 Aug  2  2015 NSF Panama Grant Draft 2015-08-01_TAH_working.docx
2908139 Aug  2  2015 NSF EAH revisions 2015-08-02.docx
  32538 Aug  2  2015 Hypothesis4edited by Aafke 010815 RAR.docx
 164430 Aug  1  2015 Hypothesis4edited by Aafke 010815.docx
2905973 Aug  1  2015 NSF Panama Grant Draft 2015-08-01.docx
 157777 Aug  1  2015 Chemical volatile objectives.docx
 708115 Jul 31  2015 NSF Panama Grant Draft 2015-07-31 alt fig.docx
1086603 Jul 31  2015 NSF Panama Grant Draft 2015-07-30 alt fig.docx
 157636 Jul 31  2015 H2 Text.docx
 134487 Jul 30  2015 H1 Text.docx
 688048 Jul 30  2015 NSF Panama Grant Draft 2015-07-30.docx
 509467 Jul 30  2015 NSF Panama Grant Draft 2015-07-29.docx
 482474 Jul 28  2015 NSF Panama Grant Draft 2015-07-28.docx
 241316 Jul 27  2015 NSF Panama Grant Draft 2015-07-27.docx
 258099 Jul 25  2015 NSF Panama Grant Draft 2015-07-25.docx
 265654 Jul 22  2015 NSF Panama Grant Draft 2015-07-22.docx
```

---

# Tracking Your Science

### What are the potential shortcomings of this approach?

```
3111123 Aug  2  2015 NSF_Project_Description_Aug2-4PM.pdf
2824531 Aug  2  2015 NSF Panama Grant Draft 2015-08-01_v2_JN.docx
2366621 Aug  2  2015 NSF EAH final revisions 2015-08-02_fin.docx
1581164 Aug  2  2015 NSF Panama Grant Draft 2015-08-01_TAH_working.docx
2908139 Aug  2  2015 NSF EAH revisions 2015-08-02.docx
  32538 Aug  2  2015 Hypothesis4edited by Aafke 010815 RAR.docx
 164430 Aug  1  2015 Hypothesis4edited by Aafke 010815.docx
2905973 Aug  1  2015 NSF Panama Grant Draft 2015-08-01.docx
 157777 Aug  1  2015 Chemical volatile objectives.docx
 708115 Jul 31  2015 NSF Panama Grant Draft 2015-07-31 alt fig.docx
1086603 Jul 31  2015 NSF Panama Grant Draft 2015-07-30 alt fig.docx
 157636 Jul 31  2015 H2 Text.docx
 134487 Jul 30  2015 H1 Text.docx
 688048 Jul 30  2015 NSF Panama Grant Draft 2015-07-30.docx
 509467 Jul 30  2015 NSF Panama Grant Draft 2015-07-29.docx
 482474 Jul 28  2015 NSF Panama Grant Draft 2015-07-28.docx
 241316 Jul 27  2015 NSF Panama Grant Draft 2015-07-27.docx
 258099 Jul 25  2015 NSF Panama Grant Draft 2015-07-25.docx
 265654 Jul 22  2015 NSF Panama Grant Draft 2015-07-22.docx
```

---

# Tracking Your Science

### With a version control system, the file history looks a lot different.

* Example: A grant proposal involving 4 contributors:

--

```
 18537 Aug  1 2016 aim-models.tex
 13341 Aug  1 2016 aim-plants.tex
 16758 Aug  1 2016 aim-tests.tex
 17750 Aug  1 2016 background.tex
 11421 Aug  1 2016 prior-etc.tex
374875 Aug  1 2016 main.pdf
  1269 Jul 28 2016 approach.tex
  5049 Jul 28 2016 broader.tex
 13356 Jul 28 2016 budget.tex
  7548 Jul 28 2016 data.tex
  4630 Jul 28 2016 facilities.tex
  5283 Jul 28 2016 nsflayout.sty
  6124 Jul 28 2016 postdoc.tex
 35216 Jul 28 2016 pps.pdf
 84726 Jul 28 2016 pps.svg
 88594 Jul 28 2016 refs.bib
```

---

# Version Control

## A version control system improves organization and collaboration

Let me show you a little animation...

```

```


---

# Version Control

### A version control system improves organization and collaboration
<div style="text-align:center"><img src="images/gource_screen.png" alt="gource" style="width: 500px;" /></div>

--

Imagine a project with many more collaborators...

---

# Version Control

A snapshot of [RevBayes](https://github.com/revbayes/revbayes): 25 contributors; 40,470 files; 5,177,357 lines of code

<div style="text-align:center"><img src="images/gource_screen-rb.png" alt="gource" style="width: 440px;" /></div>

---

# Version Control Systems

## What options are there for versioning projects?

--

* The most common in biology are Git and SVN (and historically CVS).

* There are [many others](https://en.wikipedia.org/wiki/Comparison_of_version_control_software)

---

# Version Control Systems

## What do they allow you to do?

--

* Track changes made to each file

--

* Revert the entire project or a single file to a previous version

--

* Review changes made over time

--

* View who modified the file (and `blame` them for something if necessary)

--

* Collaborate with others without overwriting their work or risk file corruption, etc.

--

* Have multiple independent **_branches_** of the same repository and make changes without
effecting others' work.

--

* And more...

---

# &#x2665; Git &#x2665;

## Git manages a filesystem as a set of snapshots 

Snapshots are called _commits_ 

<div style="text-align:center"><img src="https://git-scm.com/book/en/v2/book/01-introduction/images/snapshots.png" alt="gource" width="700" /></div>

(image source [https://git-scm.com](https://git-scm.com/book/en/v2/Getting-Started-Git-Basics))

---

# &#x2665; Git &#x2665;

## Almost every interaction with Git happens locally

<div style="text-align:center"><img src="https://git-scm.com/book/en/v2/book/01-introduction/images/areas.png" alt="gource" width="650" /></div>

(image source [https://git-scm.com](https://git-scm.com/book/en/v2/Getting-Started-Git-Basics))


---

# &#x2665; Git &#x2665;

### A remote host adds an additional level to a Git repository

Also, allows for collaboration and back-up.

--

<div style="text-align:center"><img src="https://2.bp.blogspot.com/-w2Z0FGVzygM/WBoAt2kuuCI/AAAAAAAABtU/voWHlUobfl8jB5-5NuSZD0BlN3A9jdMtgCLcB/s1600/basic-remote-workflow.png" alt="Drawing" width="425" /></div>

(image source [https://www.git-tower.com](https://www.git-tower.com/learn/git/ebook/en/command-line/remote-repositories/introduction))

---

# &#x2665; Git &#x2665;

## Remote Git repository hosting services

### There are several options for remote hosts

--

* You can set up your own server and host all of your repositories privately using [Gitolite](https://github.com/sitaramc/gitolite/) or [Gitosis](https://git-scm.com/book/no-nb/v1/Git-on-the-Server-Gitosis) (not recommended)

--

* You can use a web-based Git host

--

    * [GitHub](https://github.com/) <img src="https://assets-cdn.github.com/images/modules/logos_page/Octocat.png" alt="Drawing" width="35" />: free public repositories & paid private repositories, with repositories over 1 GB discouraged

--
    * [Bitbucket](https://bitbucket.org/) <img src="http://image.flaticon.com/icons/png/512/37/37104.png" alt="Drawing" width="25" />: unlimited free public & free private 
    repositories, limited at 1-2 GB/repo (register with your *.edu* email to get unlimited 
    collaborators on private repositories) 
--
    * [GitLab](https://about.gitlab.com/) <img src="https://gitlab.com/uploads/project/avatar/2005690/gitlab-logo-square.png" alt="Drawing" width="25" />: unlimited free public & free private repositories with no limits on collaborators, but limited at 10 GB/repo 
---

# &#x2665; Git &#x2665;

## Despite the &#x2665;, Git does have some limitations

Most relevant to this course and our fields are:

* **_Repository size_**: If your repository gets very large, working within it can be a problem.
The network speed will be the main bottleneck. This is why the online Git hosts discourage repos
over 1-2 GB. 

* **_File size_**: A single large file can be problematic, particularly if it is frequently being
modified. This can also lead to swollen repositories. GitHub will not allow any file over 100 MB.

* **_File type_**: Git works best with text files, you can have binary files in your 
repository, but you lose some functionality of version control (like `diff`). Binary files
are also often very large. Thus, it is 
recommended that you keep binary files to a minimum. (This means that it is not practical
to use Git to collaborate on MS Word documents.) 


---

# &#x2665; Git &#x2665;

## Demo: Clone a Repository

You become familiar with the concepts by using Git. We will start by cloning [ipyrad](https://github.com/dereneaton/ipyrad).

ipyrad is a python program for interactive assembly and analysis of RAD-seq data sets

--

Start by going to [https://github.com/dereneaton/ipyrad](https://github.com/dereneaton/ipyrad) to get the URL.

--

```
$ git clone git@github.com:dereneaton/ipyrad.git
Cloning into 'ipyrad'...
remote: Counting objects: 11341, done.
remote: Compressing objects: 100% (304/304), done.
remote: Total 11341 (delta 199), reused 0 (delta 0), pack-reused 11037
Receiving objects: 100% (11341/11341), 70.75 MiB | 5.91 MiB/s, done.
Resolving deltas: 100% (8275/8275), done.
Checking connectivity... done.
```

Note that using the url `git@github...` instead of `https://github...` uses SSH 
if you haven't configured your key yet, this may not work.

---

# &#x2665; Git &#x2665;

### Some helpful commands for this cloned repository

--

Always `pull` from the repository before doing anything with the contents

```
$ git pull origin master
```

--

Check the `status` of your local files

```
$ git status
```

--

See the `log` of the snapshots and their commit messages

```
$ git log
```

--

Compare the `diff`erences a file you have modified and the last commit

```
$ git diff README.rst
```

---

# &#x2665; Git &#x2665;

### Some helpful commands for this cloned repository

Replace a file you modified with the most recent commit using `checkout`

```
$ git checkout README.rst
```

--

Find out which `branch` you're on

```
$ git branch
```
--

Change to a different branch

```
$ git checkout h5step7
```

--

Pull to update from `h5step7`

```
$ git pull origin h5step7
```

---

# &#x2665; Git &#x2665;

## What can you do with someone else's GitHub repository?

### If you do not have _push rights_ 

--

* You can only clone the repository and make changes locally

--

* You can **_fork_** their repository and develop it independently

--

* You can submit a **_pull request_** to their repository if you want to contribute to the original project

--

* You can contact the owner of the repository and ask them to include you as a 
contributor and give you push rights (it is recommended that you discuss the nature of your collaboration with them first)

---

# &#x2665; Git &#x2665;

## Creating a Repository

Let's create a new Git repository and host it on GitHub using our accounts (for the demo, 
I will create the repo in the [EEOB-BioData](https://github.com/EEOB-BioData) organization
so that everyone will have easy access to it).

<br>

--

But first, let's tell Git who you are

```
$ git config --global user.name "Ada Lovelace"
$ git config --global user.email "adal@analyticalengine.com"
```


---

# &#x2665; Git &#x2665;

### Some helpful commands for your new repository

--

Initialize a new Git repository

```
$ git init
```

--

After a file has been added or modified, you can stage the file 

```
$ git add README.md
```

--

Commit the file to your local repository and write a message

```
$ git commit -m "initial commit (README.md)"
```


---

# &#x2665; Git &#x2665;

### Some helpful commands for your new repository

After you have made your commit, the repository is up-to-date locally. Next you need to connect
your local repo to the remote.

--

Add the remote

```
$ git remote add origin git@github.com:username/repo-name.git
```

--

Push your snapshot to the remote

```
$ git push -u origin master
```

---

# For Thursday

### Your Challenge (not graded)

* Write some notes on Chapter 5 in [Bioinformatics Data Skills](http://vincebuffalo.org/book/) (or anything that you found interesting, confusing, or worth sharing) in 
a text file (you can use Markdown or HTML if you like).

* Add this file to your repository and push the changes to the remote on GitHub

* Post the URL to your repository to the `#resources` channel on Slack

* If you have any problems just post a message on the `#scripting_help` channel

    </textarea>
    <script src="https://gnab.github.io/remark/downloads/remark-latest.min.js" type="text/javascript">
    </script>
    <script>
      var hljs = remark.highlighter.engine;
    </script>
    <script src="remark.language.js"></script>
    <script>
      var slideshow = remark.create({
          highlightStyle: 'monokai',
          highlightLanguage: 'tex',
          highlightLines: true
        }) ;
    </script>
    <script>
      var _gaq = _gaq || [];
      _gaq.push(['_setAccount', 'UA-44561333-1']);
      _gaq.push(['_trackPageview']);

      (function() {
        var ga = document.createElement('script');
        ga.src = 'https://ssl.google-analytics.com/ga.js';
        var s = document.scripts[0];
        s.parentNode.insertBefore(ga, s);
      }());
    </script>
  </body>
</html>
